Goto

Collaborating Authors

 boundary region


Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification

arXiv.org Machine Learning

The global uniform aggregation of random forests leaves conditional bias along the decision boundary uncorrected. To correct this locally, we propose exploiting the structural pattern of each tree's decision path. At inference, a random forest reaches its prediction through the root-to-leaf path the sample traverses in each tree, so path-level reliability offers a finer granularity than tree-level weighting can access. We show that reliability varies meaningfully across path patterns in the boundary region identified by the forest itself, and that using this signal yields a statistically significant accuracy improvement over RF on 36 binary classification benchmarks (Wilcoxon p < 0.0001). We further devise a way to measure the sufficiency of residual information in the fitted RF's decision boundary, providing an estimate of the expected gain before the method is applied; on the qualifying group identified this way, the method delivers a mean +0.99 pp accuracy improvement with strict wins on every dataset (7/0/0). Class-recall regression -- the typical failure mode of RF correction methods -- is measured: zero minority-recall regressions and a single majority-recall regression at the 0.2 pp threshold, indicating that the correction operates in the direction of bias reduction rather than class trade-off. Our work suggests that the structural information of decision paths, previously overlooked in random forest research, can contribute to RF performance improvement.


Support-aware offline policy selection for advertising marketplaces

arXiv.org Machine Learning

Logged advertising auctions make offline reserve-price evaluation attractive but risky. Replay tables can identify policies with large apparent yield gains, yet they can also hide weak threshold support, multiple-comparison effects, subgroup harm, and bidder-response uncertainty. Existing replay and off-policy evaluation methods estimate or rank policy values, but they do not directly answer the operational question of whether the available evidence is strong enough to justify validation. This paper develops a support-aware offline decision framework for reserve-policy selection. Rather than outputting a single point-estimate winner, the framework converts logged evidence into a conservative decision object consisting of certified policies, statistically dominated alternatives, and unresolved candidates requiring further validation. The main theoretical result gives a unified finite-catalog guarantee showing that, under simultaneous uncertainty control and conservative support gates, the framework preserves the best gate-passing policy while eliminating only policies with certified regret. Supporting results characterize support-localized replay generalization, establish information-theoretic threshold-resolution limits, and quantify when heterogeneous bidder response can overturn localized replay rankings. Experiments on iPinYou real-time-bidding logs show that the leading reserve rule achieves a 47.66% replay lift in season two, a 40.71% simultaneous lower-bound lift, and a 43.87% frozen out-of-time replay lift in season three. The framework reduces a 19-policy catalog to a two-policy validation shortlist while certifying non-harm across 44 advertiser, exchange, and region segments. The results support the central claim that offline reserve-policy evaluation should produce certified validation decisions rather than point-estimate rankings alone.


CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing

arXiv.org Artificial Intelligence

For efficient and high-fidelity local facial attribute editing, most existing editing methods either require additional fine-tuning for different editing effects or tend to affect beyond the editing regions. Alternatively, inpainting methods can edit the target image region while preserving external areas. However, current inpainting methods still suffer from the generation misalignment with facial attributes description and the loss of facial skin details. To address these challenges, (i) a novel data utilization strategy is introduced to construct datasets consisting of attribute-text-image triples from a data-driven perspective, (ii) a Causality-Aware Condition Adapter is proposed to enhance the contextual causality modeling of specific details, which encodes the skin details from the original image while preventing conflicts between these cues and textual conditions. In addition, a Skin Transition Frequency Guidance technique is introduced for the local modeling of contextual causality via sampling guidance driven by low-frequency alignment. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in boosting both fidelity and editability for localized attribute editing. The code is available at https://github.com/connorxian/CA-Edit.


Granular-Balls based Fuzzy Twin Support Vector Machine for Classification

arXiv.org Artificial Intelligence

The twin support vector machine (TWSVM) classifier has attracted increasing attention because of its low computational complexity. However, its performance tends to degrade when samples are affected by noise. The granular-ball fuzzy support vector machine (GBFSVM) classifier partly alleviates the adverse effects of noise, but it relies solely on the distance between the granular-ball's center and the class center to design the granular-ball membership function. In this paper, we first introduce the granular-ball twin support vector machine (GBTWSVM) classifier, which integrates granular-ball computing (GBC) with the twin support vector machine (TWSVM) classifier. By replacing traditional point inputs with granular-balls, we demonstrate how to derive a pair of non-parallel hyperplanes for the GBTWSVM classifier by solving a quadratic programming problem. Subsequently, we design the membership and non-membership functions of granular-balls using Pythagorean fuzzy sets to differentiate the contributions of granular-balls in various regions. Additionally, we develop the granular-ball fuzzy twin support vector machine (GBFTSVM) classifier by incorporating GBC with the fuzzy twin support vector machine (FTSVM) classifier. We demonstrate how to derive a pair of non-parallel hyperplanes for the GBFTSVM classifier by solving a quadratic programming problem. We also design algorithms for the GBTSVM classifier and the GBFTSVM classifier. Finally, the superior classification performance of the GBTWSVM classifier and the GBFTSVM classifier on 20 benchmark datasets underscores their scalability, efficiency, and robustness in tackling classification tasks.


Generalizing Machine Learning Evaluation through the Integration of Shannon Entropy and Rough Set Theory

arXiv.org Artificial Intelligence

This research paper delves into the innovative integration of Shannon entropy and rough set theory, presenting a novel approach to generalize the evaluation approach in machine learning. The conventional application of entropy, primarily focused on information uncertainty, is extended through its combination with rough set theory to offer a deeper insight into data's intrinsic structure and the interpretability of machine learning models. We introduce a comprehensive framework that synergizes the granularity of rough set theory with the uncertainty quantification of Shannon entropy, applied across a spectrum of machine learning algorithms. Our methodology is rigorously tested on various datasets, showcasing its capability to not only assess predictive performance but also to illuminate the underlying data complexity and model robustness. The results underscore the utility of this integrated approach in enhancing the evaluation landscape of machine learning, offering a multi-faceted perspective that balances accuracy with a profound understanding of data attributes and model dynamics. This paper contributes a groundbreaking perspective to machine learning evaluation, proposing a method that encapsulates a holistic view of model performance, thereby facilitating more informed decision-making in model selection and application.


Interactive Multi-Robot Flocking with Gesture Responsiveness and Musical Accompaniment

arXiv.org Artificial Intelligence

For decades, robotics researchers have pursued various tasks for multi-robot systems, from cooperative manipulation to search and rescue. These tasks are multi-robot extensions of classical robotic tasks and often optimized on dimensions such as speed or efficiency. As robots transition from commercial and research settings into everyday environments, social task aims such as engagement or entertainment become increasingly relevant. This work presents a compelling multi-robot task, in which the main aim is to enthrall and interest. In this task, the goal is for a human to be drawn to move alongside and participate in a dynamic, expressive robot flock. Towards this aim, the research team created algorithms for robot movements and engaging interaction modes such as gestures and sound. The contributions are as follows: (1) a novel group navigation algorithm involving human and robot agents, (2) a gesture responsive algorithm for real-time, human-robot flocking interaction, (3) a weight mode characterization system for modifying flocking behavior, and (4) a method of encoding a choreographer's preferences inside a dynamic, adaptive, learned system. An experiment was performed to understand individual human behavior while interacting with the flock under three conditions: weight modes selected by a human choreographer, a learned model, or subset list. Results from the experiment showed that the perception of the experience was not influenced by the weight mode selection. This work elucidates how differing task aims such as engagement manifest in multi-robot system design and execution, and broadens the domain of multi-robot tasks.


Three-way Decisions with Evaluative Linguistic Expressions

arXiv.org Artificial Intelligence

The theory of three-way decisions (TWD) divides a finite and non-empty universe into three disjoint sets, which are called positive, negative, and boundary regions. These regions respectively induce positive, negative, and boundary rules: a positive rule makes a decision of acceptance, a negative rule makes a decision of rejection, and a boundary rule makes an abstained or non-committed decision [1, 2]. The concept of three-way decisions was originally introduced in Rough Set Theory [1, 3] and until today, it has been widely studied and applied to many decision-making problems (see [4, 5, 6, 7] for some examples). Thus, several approaches have been proposed to generate the three regions; one of them is based on probabilistic rough sets, which generalizes probabilistic rough sets [8, 9] where the three regions are constructed using a pair of thresholds and the notion of conditional probability (in this case, the regions are called probabilistic positive, negative, and boundary regions). The contribution of this article is to provide a linguistic interpretation of the positive, negative, and boundary regions.


High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

arXiv.org Artificial Intelligence

Fully automatic segmentation methods, such as liver and liver tumor segmentation, brain and brain tumor segmentation, optic disc segmentation, cell segmentation, lung segmentation, pulmonary nodule segmentation, and cardiac image segmentation [2], are essential for the diagnosis of serious diseases [3]. Therefore, it is important to improve the efficiency and accuracy of medical image segmentation methods. Medical image segmentation involves segmenting specific organs (e.g., the pancreas, liver, and bladder), determining certain functional parts of an organ (e.g., cardiac segmentation and retinal vessel segmentation), and identifying tumors in the organs. Medical images can generally be categorized according to the imaging technology and data form. Imaging technology includes X-ray photos, computed tomography, magnetic resonance imaging (MRI), and ultrasound imaging. Raw measurements are transformed into pixelated imaging data as part of the standard process. Although the original data are mostly three-dimensional images, two-dimensional slices are often created according to clinical procedure protocols that target specific applications. Most medical image segmentation methods are designed for two-dimensional slices.


An Efficient and Accurate Rough Set for Feature Selection, Classification and Knowledge Representation

arXiv.org Artificial Intelligence

This paper present a strong data mining method based on rough set, which can realize feature selection, classification and knowledge representation at the same time. Rough set has good interpretability, and is a popular method for feature selections. But low efficiency and low accuracy are its main drawbacks that limits its application ability. In this paper,corresponding to the accuracy, we first find the ineffectiveness of rough set because of overfitting, especially in processing noise attribute, and propose a robust measurement for an attribute, called relative importance.we proposed the concept of "rough concept tree" for knowledge representation and classification. Experimental results on public benchmark data sets show that the proposed framework achieves higher accurcy than seven popular or the state-of-the-art feature selection methods.


The Importance of Location in Real Estate, Weather, and Machine Learning

#artificialintelligence

Real estate experts like to say that the three most important features of a property are: location, location, location! Likewise, weather events are highly location-dependent. We will see below how a similar perspective is also applicable to machine learning algorithms. In real estate, the buyer is first and foremost concerned about location for at least 3 reasons: (a) the desirability of the surrounding neighborhood; (b) the proximity to schools, businesses, services, etc.; and (c) the value of properties in that area. Similarly, meteorologists tell us that all weather is local.